Visualisation of excess mortality over Europe in 2020 using Eurostat data

The purpose of this notebook is to dynamically generate maps of weekly excess mortality over Europe NUTS2 or NUTS3 regions, depending on their availability.

Data sources:

  • Eurostat data on "Deaths by age group, sex, week and NUTS 3 region" demo_r_mweek3,
  • Eurostat data on "Deaths by week and sex" demo_r_mwk_ts (not used here),
  • Eurostat geographical data on regional units NUTS 2016 (see GISCO website).

Notes:

  • We provide here with some quick and dirty material to fetch the data from the provider (e.g., in a bulk manner, not using the API) and compute basic descriptive statistics (e.g., an excess rate). This will need to be improved in order to further automate it.
  • This notebook was created on May 13th, 2020. Therefore, the initial results presented here refer to the data extracted from Eurostat website at that date. Note: updated on June 26th, 2020.
  • On June 24th, 2020, Eurostat released a press article on 2020 data on weekly deaths.
  • On June 26th, 2020, Eurostat released a Statistics Explained article on weekly death statistics.

Settings

In [1]:
_THISDIR_ = !pwd
print('Current working directory: %s' % _THISDIR_)
Current working directory: ['/Users/gjacopo/DevOps/morbstat']

Let's import all we need, starting with the basic packages:

In [2]:
import requests
import io, os, re, sys
import warnings
import copy, functools
import datetime
import zipfile

as well as common data handling mocdules pandas and numpy:

In [3]:
import json

import pandas as pd
pd.set_option('mode.chained_assignment', None) # ignore SettingWithCopyWarning
import numpy as np

We also perform all geometric/set operations of vector layers using the geopandas package:

In [4]:
try:
    import geopandas as gpd
except ImportError:
    try:
        !{sys.executable} -m pip install geopandas
    except:
        print("! Package geopandas not installed !")
    else:
        print("! Package geopandas installed on-the-fly !")
        import geopandas as gpd
finally:
    warnings.filterwarnings('ignore', 'GeoSeries.notna', UserWarning)

try:
    from shapely import geometry
except ImportError:
    try:
        !{sys.executable} -m pip install shapely
    except:
        print("! Package geopandas not installed !")
    else:
        print("! Package geopandas installed on-the-fly !")
        from shapely import geometry

For visualisation purpose, the package matplotlib is used:

In [5]:
try:
    import matplotlib
except ImportError:
    raise IOError("Guess what: you're doomed...")
else:
    import matplotlib.pyplot as mplt
    import matplotlib.dates as mdates
    from matplotlib.ticker import FuncFormatter, MaxNLocator, IndexLocator
finally:
    _FIGSIZE_, _DPI_ = (7,4), 140 # just some default display size/resolution inside this notebook...
%matplotlib inline

Last, for interactive mapping, we use folium while branca is used for building a colormpa similar to that of the Statistics Explained article mentioned above:

In [6]:
try:
    import folium
except ImportError:
    try:
        !{sys.executable} -m pip install folium
    except:
        print("! Package folium not installed !")
    else:
        print("! Package folium installed on-the-fly !")
        import folium

try:
    import branca
except ImportError:
    try:
        !{sys.executable} -m pip install branca
    except:
        print("! Package branca not installed !")
    else:
        print("! Package branca installed on-the-fly !")
finally:
    import branca.colormap as bcm

(Dirty) data ingestion

Data on death figures will be fetched from Eurostat bulk website. The link to the data source in TSV format is : https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_mweek3.tsv.gz.

In [7]:
bulk_domain = 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1'
death_file = 'demo_r_mweek3'
death_ext = 'tsv.gz'
death_url = '{}&file=data%2F{}.{}'.format(bulk_domain, death_file, death_ext)

print("URL for death data: \033[1m%s\033[0m" % death_url)
URL for death data: https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_mweek3.tsv.gz
In [8]:
death_source = '%s.%s' % (death_file, death_ext)

try : 
    dest = os.path.join(_THISDIR_[0], ('.').join(death_source.split('.')[:-1]))
    assert os.path.exists(dest)
except:
    try:
        !wget -O $death_source "$death_url"
        !gunzip -f $death_source
    except:
        raise IOError("Error fetching & unzipping the data...")
    else:
        print('Data are loaded on disk in \033[1m%s\033[0m' % dest)
else:
    print('Data alredy on disk in \033[1m%s\033[0m' % dest)
finally:
    death_ext = death_ext.split('.')[0]
    death_source = dest
--2020-06-29 10:14:38--  https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_mweek3.tsv.gz
Resolving ec.europa.eu (ec.europa.eu)... 2a01:7080:24:100::666:30, 2a01:7080:14:100::666:30, 147.67.34.30, ...
Connecting to ec.europa.eu (ec.europa.eu)|2a01:7080:24:100::666:30|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20177983 (19M) [application/octet-stream]
Saving to: ‘demo_r_mweek3.tsv.gz’

demo_r_mweek3.tsv.g 100%[===================>]  19.24M  1.88MB/s    in 9.8s    

2020-06-29 10:14:48 (1.95 MB/s) - ‘demo_r_mweek3.tsv.gz’ saved [20177983/20177983]

Data are loaded on disk in /Users/gjacopo/DevOps/morbstat/demo_r_mweek3.tsv

Ibid with the geographical resources that you can retrieve from GISCO API, e.g. https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip:

In [9]:
gisco_domain = 'https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download'
nuts_year = 2016
nuts_res = 60
nuts_file = 'ref-nuts-%s-%sm' % (nuts_year, nuts_res) 
nuts_ext = 'geojson.zip'
nuts_url = '%s/%s.%s' %  (gisco_domain, nuts_file, nuts_ext)

print("URL for geographical data: \033[1m%s\033[0m" % nuts_url)
URL for geographical data: https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip
In [10]:
nuts_source = '%s.%s' % (nuts_file, nuts_ext)
nuts_dir = 'NUTS' 

try : 
    dest = os.path.join(_THISDIR_[0],nuts_dir)
    assert os.path.exists(dest)
except:
    try:
        !wget -O $nuts_source "$nuts_url"
        !mkdir $nuts_dir
        !unzip -u -d $nuts_dir $nuts_source        
    except:
        raise IOError("Error fetching & unzipping the data...")
    else:
        print('Data are loaded on disk in directory: \033[1m%s\033[0m' % dest)
else:
    print('Data already loaded on disk in directory: \033[1m%s\033[0m' % dest)
finally:
    nuts_ext = nuts_ext.split('.')[0]
    nuts_source = nuts_dir
--2020-06-29 10:14:52--  https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip
Resolving ec.europa.eu (ec.europa.eu)... 2a01:7080:24:100::666:30, 2a01:7080:14:100::666:30, 147.67.34.30, ...
Connecting to ec.europa.eu (ec.europa.eu)|2a01:7080:24:100::666:30|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip [following]
--2020-06-29 10:14:52--  https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip
Resolving gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)... 40.113.93.170
Connecting to gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)|40.113.93.170|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2588622 (2.5M) [application/zip]
Saving to: ‘ref-nuts-2016-60m.geojson.zip’

ref-nuts-2016-60m.g 100%[===================>]   2.47M  1.76MB/s    in 1.4s    

2020-06-29 10:14:53 (1.76 MB/s) - ‘ref-nuts-2016-60m.geojson.zip’ saved [2588622/2588622]

Archive:  ref-nuts-2016-60m.geojson.zip
  inflating: NUTS/NUTS_RG_60M_2016_3035.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3035_LEVL_0.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3035_LEVL_1.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3035_LEVL_2.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3035_LEVL_3.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3857.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3857_LEVL_0.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3857_LEVL_1.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3857_LEVL_2.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_3857_LEVL_3.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_4326.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_4326_LEVL_0.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_4326_LEVL_1.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_4326_LEVL_2.geojson  
  inflating: NUTS/NUTS_RG_60M_2016_4326_LEVL_3.geojson  
  inflating: NUTS/NUTS_LB_2016_3035.geojson  
  inflating: NUTS/NUTS_LB_2016_3035_LEVL_0.geojson  
  inflating: NUTS/NUTS_LB_2016_3035_LEVL_1.geojson  
  inflating: NUTS/NUTS_LB_2016_3035_LEVL_2.geojson  
  inflating: NUTS/NUTS_LB_2016_3035_LEVL_3.geojson  
  inflating: NUTS/NUTS_LB_2016_3857.geojson  
  inflating: NUTS/NUTS_LB_2016_3857_LEVL_0.geojson  
  inflating: NUTS/NUTS_LB_2016_3857_LEVL_1.geojson  
  inflating: NUTS/NUTS_LB_2016_3857_LEVL_2.geojson  
  inflating: NUTS/NUTS_LB_2016_3857_LEVL_3.geojson  
  inflating: NUTS/NUTS_LB_2016_4326.geojson  
  inflating: NUTS/NUTS_LB_2016_4326_LEVL_0.geojson  
  inflating: NUTS/NUTS_LB_2016_4326_LEVL_1.geojson  
  inflating: NUTS/NUTS_LB_2016_4326_LEVL_2.geojson  
  inflating: NUTS/NUTS_LB_2016_4326_LEVL_3.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3035.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3035_LEVL_0.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3035_LEVL_1.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3035_LEVL_2.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3035_LEVL_3.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3857.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3857_LEVL_0.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3857_LEVL_1.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3857_LEVL_2.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_3857_LEVL_3.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_4326.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_4326_LEVL_0.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_4326_LEVL_1.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_4326_LEVL_2.geojson  
  inflating: NUTS/NUTS_BN_60M_2016_4326_LEVL_3.geojson  
  inflating: NUTS/NUTS_AT_2016.csv   
  inflating: NUTS/NUTS_RG_BN_60M_2016.csv  
  inflating: NUTS/metadata.pdf       
  inflating: NUTS/metadata.xml       
  inflating: NUTS/release-notes.txt  
Data are loaded on disk in directory: /Users/gjacopo/DevOps/morbstat/NUTS

Fetch the URL to get the response:

Data preparation

We first explore the NUTS data. Actually we consider NUTS region boundaries at all levels (LEVELS), from 0 to 3, since some data are available at country level. Hence we 'store' one vector dataset per NUTS level (nuts_data):

In [11]:
LEVELS = [0,1,2,3]

files = dict.fromkeys(LEVELS)
[files.update({l: 'NUTS_RG_%sM_%s_3035_LEVL_%s.%s' % (nuts_res, nuts_year, l, nuts_ext)}) 
 for l in LEVELS]

nuts_data = dict.fromkeys(LEVELS)
[nuts_data.update({l: gpd.read_file(os.path.join(_THISDIR_[0],nuts_source,files[l]), driver='GeoJSON')}) 
 for l in LEVELS]

print("Geographical datasets: \033[1m%s\033[0m loaded" % list(files.values()))
print("Projection: \033[1m%s\033[0m" % nuts_data[LEVELS[0]].crs)

nuts_data[LEVELS[1]].head(5)
Geographical datasets: ['NUTS_RG_60M_2016_3035_LEVL_0.geojson', 'NUTS_RG_60M_2016_3035_LEVL_1.geojson', 'NUTS_RG_60M_2016_3035_LEVL_2.geojson', 'NUTS_RG_60M_2016_3035_LEVL_3.geojson'] loaded
Projection: epsg:3035
Out[11]:
id COAST_TYPE MOUNT_TYPE CNTR_CODE FID NUTS_ID NUTS_NAME LEVL_CODE URBN_TYPE geometry
0 HU1 0 0 HU HU1 HU1 KÖZÉP-MAGYARORSZÁG 1 0 POLYGON ((5046179.000 2766903.000, 5086597.000...
1 HU2 0 0 HU HU2 HU2 DUNÁNTÚL 1 0 POLYGON ((4897845.000 2768465.000, 4912120.000...
2 HU3 0 0 HU HU3 HU3 ALFÖLD ÉS ÉSZAK 1 0 POLYGON ((5214660.000 2880854.000, 5216710.000...
3 CY0 0 0 CY CY0 CY0 ΚΥΠΡΟΣ 1 0 MULTIPOLYGON (((6467380.000 1643362.000, 64132...
4 CZ0 0 0 CZ CZ0 CZ0 ČESKÁ REPUBLIKA 1 0 POLYGON ((4635755.000 3113255.000, 4645979.000...

Let's have a "quick" render of the NUTS data:

In [12]:
level = LEVELS[2]

f, ax = mplt.subplots(1, figsize=(16, 16))
nuts_data[level].plot(ax=ax)
ax.axes.get_xaxis().set_visible(False); ax.axes.get_yaxis().set_visible(False)
ax.set_title('NUTS at level %s' % level) 
f.tight_layout()
mplt.show()

Following, we also integrate the mortality data (death_data):

In [13]:
death_data = pd.read_csv(death_source, sep='\t',encoding='latin1')

print('Dimensions of the table: %s' % list(death_data.shape))
death_data.head(5)
Dimensions of the table: [67038, 1069]
Out[13]:
unit,sex,age,geo\time 2020W25 2020W24 2020W23 2020W22 2020W21 2020W20 2020W19 2020W18 2020W17 ... 2000W10 2000W09 2000W08 2000W07 2000W06 2000W05 2000W04 2000W03 2000W02 2000W01
0 NR,F,TOTAL,AT : : : 729 p 724 p 735 p 748 p 759 p 791 p ... 834 874 908 914 967 1071 1076 1141 1062 1053
1 NR,F,TOTAL,AT1 : : : 343 p 328 p 328 p 355 p 355 p 350 p ... 390 418 447 446 428 496 501 596 577 545
2 NR,F,TOTAL,AT11 : : : 26 p 28 p 29 p 25 p 41 p 25 p ... 33 41 37 32 37 40 42 39 53 35
3 NR,F,TOTAL,AT12 : : : 162 p 152 p 164 p 165 p 157 p 174 p ... 168 186 179 182 184 208 205 251 225 225
4 NR,F,TOTAL,AT13 : : : 155 p 148 p 135 p 165 p 157 p 151 p ... 189 191 231 232 207 248 254 306 299 285

5 rows × 1069 columns

We retrieve some basic info, e.g. the number of weekly acquisitions available (WEEK_COLS), the temporal coverage (YEARS), including the current/last one (YCUR) and weeks covered (WEEKS) during that year:

In [14]:
WEEK_COLS = [col for col in death_data.columns if re.search('W',col) is not None]
print('Number of weekly observations: %s' % len(WEEK_COLS))

YEARS = list(set([int(col.split('W')[0]) for col in WEEK_COLS]))
print("Years present in the dataset: \033[1m%s\033[0m" % YEARS)

YCUR = max(YEARS)
print("Current year: \033[1m%s\033[0m" % YCUR)

WEEKS = list(set([int(col.split('W')[1]) for col in WEEK_COLS if col.startswith(str(YCUR))]))
print("Weeks available in current year: \033[1m%s\033[0m" % WEEKS)
Number of weekly observations: 1068
Years present in the dataset: [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
Current year: 2020
Weeks available in current year: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

Say that we are looking to a specific period, we know already that we will not be using all the data. In that case we specify the starting year (YSTART) of the study. All data collected before that date will be discarded:

In [15]:
YSTART = 2016

try:
    assert YSTART in YEARS
except:
    raise IOError("Select another start year in \033[1m%s\033[0m" % YEARS)
else:
    CYEARS, RYEARS = range(YSTART,max(YEARS)), range(YSTART,max(YEARS)+1)

DISCARD_COLS = [col for col in WEEK_COLS if int(col.split('W')[0])<YSTART]
COLUMNS = [col for col in death_data.columns if not re.search('W', col) or int(col.split('W')[0])>=YSTART]

WEEK_COLS = list(set(WEEK_COLS).difference(set(DISCARD_COLS)))
print('Updated number of quarterly observations: %s' % len(WEEK_COLS))

if len(DISCARD_COLS) < len(COLUMNS):
    death_data.drop(columns=DISCARD_COLS, inplace=True)
else:
    death_data = death_data[COLUMNS]
    
print("Updated list of variables considered for this analysis: \033[1m%s\033[0m" 
      % COLUMNS) # % list(death_data.columns)
Updated number of quarterly observations: 233
Updated list of variables considered for this analysis: ['unit,sex,age,geo\\time', '2020W25 ', '2020W24 ', '2020W23 ', '2020W22 ', '2020W21 ', '2020W20 ', '2020W19 ', '2020W18 ', '2020W17 ', '2020W16 ', '2020W15 ', '2020W14 ', '2020W13 ', '2020W12 ', '2020W11 ', '2020W10 ', '2020W09 ', '2020W08 ', '2020W07 ', '2020W06 ', '2020W05 ', '2020W04 ', '2020W03 ', '2020W02 ', '2020W01 ', '2019W52 ', '2019W51 ', '2019W50 ', '2019W49 ', '2019W48 ', '2019W47 ', '2019W46 ', '2019W45 ', '2019W44 ', '2019W43 ', '2019W42 ', '2019W41 ', '2019W40 ', '2019W39 ', '2019W38 ', '2019W37 ', '2019W36 ', '2019W35 ', '2019W34 ', '2019W33 ', '2019W32 ', '2019W31 ', '2019W30 ', '2019W29 ', '2019W28 ', '2019W27 ', '2019W26 ', '2019W25 ', '2019W24 ', '2019W23 ', '2019W22 ', '2019W21 ', '2019W20 ', '2019W19 ', '2019W18 ', '2019W17 ', '2019W16 ', '2019W15 ', '2019W14 ', '2019W13 ', '2019W12 ', '2019W11 ', '2019W10 ', '2019W09 ', '2019W08 ', '2019W07 ', '2019W06 ', '2019W05 ', '2019W04 ', '2019W03 ', '2019W02 ', '2019W01 ', '2018W52 ', '2018W51 ', '2018W50 ', '2018W49 ', '2018W48 ', '2018W47 ', '2018W46 ', '2018W45 ', '2018W44 ', '2018W43 ', '2018W42 ', '2018W41 ', '2018W40 ', '2018W39 ', '2018W38 ', '2018W37 ', '2018W36 ', '2018W35 ', '2018W34 ', '2018W33 ', '2018W32 ', '2018W31 ', '2018W30 ', '2018W29 ', '2018W28 ', '2018W27 ', '2018W26 ', '2018W25 ', '2018W24 ', '2018W23 ', '2018W22 ', '2018W21 ', '2018W20 ', '2018W19 ', '2018W18 ', '2018W17 ', '2018W16 ', '2018W15 ', '2018W14 ', '2018W13 ', '2018W12 ', '2018W11 ', '2018W10 ', '2018W09 ', '2018W08 ', '2018W07 ', '2018W06 ', '2018W05 ', '2018W04 ', '2018W03 ', '2018W02 ', '2018W01 ', '2017W52 ', '2017W51 ', '2017W50 ', '2017W49 ', '2017W48 ', '2017W47 ', '2017W46 ', '2017W45 ', '2017W44 ', '2017W43 ', '2017W42 ', '2017W41 ', '2017W40 ', '2017W39 ', '2017W38 ', '2017W37 ', '2017W36 ', '2017W35 ', '2017W34 ', '2017W33 ', '2017W32 ', '2017W31 ', '2017W30 ', '2017W29 ', '2017W28 ', '2017W27 ', '2017W26 ', '2017W25 ', '2017W24 ', '2017W23 ', '2017W22 ', '2017W21 ', '2017W20 ', '2017W19 ', '2017W18 ', '2017W17 ', '2017W16 ', '2017W15 ', '2017W14 ', '2017W13 ', '2017W12 ', '2017W11 ', '2017W10 ', '2017W09 ', '2017W08 ', '2017W07 ', '2017W06 ', '2017W05 ', '2017W04 ', '2017W03 ', '2017W02 ', '2017W01 ', '2016W52 ', '2016W51 ', '2016W50 ', '2016W49 ', '2016W48 ', '2016W47 ', '2016W46 ', '2016W45 ', '2016W44 ', '2016W43 ', '2016W42 ', '2016W41 ', '2016W40 ', '2016W39 ', '2016W38 ', '2016W37 ', '2016W36 ', '2016W35 ', '2016W34 ', '2016W33 ', '2016W32 ', '2016W31 ', '2016W30 ', '2016W29 ', '2016W28 ', '2016W27 ', '2016W26 ', '2016W25 ', '2016W24 ', '2016W23 ', '2016W22 ', '2016W21 ', '2016W20 ', '2016W19 ', '2016W18 ', '2016W17 ', '2016W16 ', '2016W15 ', '2016W14 ', '2016W13 ', '2016W12 ', '2016W11 ', '2016W10 ', '2016W09 ', '2016W08 ', '2016W07 ', '2016W06 ', '2016W05 ', '2016W04 ', '2016W03 ', '2016W02 ', '2016W01 ']

Since we also notice the columns' names include blanks, we reformat them (while we update the list):

In [16]:
death_data.rename(columns=lambda x: x.strip(), inplace=True)
# print("Stripped data column names are: \033[1m%s\033[0m" % list(death_data.columns))

WEEK_COLS = [col.strip() for col in WEEK_COLS]

In tsv format the indexing variables are actually merged in the first column ('unit,sex,age,geo\\time'). Let's reformat the dataset (tsv_prepare):

In [17]:
def tsv_prepare(df):
    first_col = df.columns[0]
    cols = first_col.split('\\')[0].split(',')
    def split_column(col):
        return col.split(',')
    df[cols] = df.apply(lambda row: pd.Series(split_column(row[first_col])), axis=1)
    return df.drop(columns=first_col)

death_data = tsv_prepare(death_data)
# print("Data column names are: \033[1m%s\033[0m" % list(death_data.columns))

death_data.head(5)
Out[17]:
2020W25 2020W24 2020W23 2020W22 2020W21 2020W20 2020W19 2020W18 2020W17 2020W16 ... 2016W06 2016W05 2016W04 2016W03 2016W02 2016W01 unit sex age geo
0 : : : 729 p 724 p 735 p 748 p 759 p 791 p 903 p ... 882 836 857 814 823 853 NR F TOTAL AT
1 : : : 343 p 328 p 328 p 355 p 355 p 350 p 406 p ... 402 392 395 347 380 387 NR F TOTAL AT1
2 : : : 26 p 28 p 29 p 25 p 41 p 25 p 42 p ... 28 31 42 26 38 32 NR F TOTAL AT11
3 : : : 162 p 152 p 164 p 165 p 157 p 174 p 175 p ... 206 181 178 156 167 160 NR F TOTAL AT12
4 : : : 155 p 148 p 135 p 165 p 157 p 151 p 189 p ... 168 180 175 165 175 195 NR F TOTAL AT13

5 rows × 237 columns

We also retrieve available regions (REGIONS):

In [18]:
REGIONS = death_data['geo'].unique().tolist()

print("NUTS regions available: \033[1m%s\033[0m" % REGIONS)
NUTS regions available: ['AT', 'AT1', 'AT11', 'AT12', 'AT13', 'AT2', 'AT21', 'AT22', 'AT3', 'AT31', 'AT32', 'AT33', 'AT34', 'BE', 'BE1', 'BE10', 'BE100', 'BE2', 'BE21', 'BE211', 'BE212', 'BE213', 'BE22', 'BE221', 'BE222', 'BE223', 'BE23', 'BE231', 'BE232', 'BE233', 'BE234', 'BE235', 'BE236', 'BE24', 'BE241', 'BE242', 'BE25', 'BE251', 'BE252', 'BE253', 'BE254', 'BE255', 'BE256', 'BE257', 'BE258', 'BE3', 'BE31', 'BE310', 'BE32', 'BE321', 'BE322', 'BE323', 'BE324', 'BE325', 'BE326', 'BE327', 'BE33', 'BE331', 'BE332', 'BE334', 'BE335', 'BE336', 'BE34', 'BE341', 'BE342', 'BE343', 'BE344', 'BE345', 'BE35', 'BE351', 'BE352', 'BE353', 'BG', 'BG3', 'BG31', 'BG311', 'BG312', 'BG313', 'BG314', 'BG315', 'BG32', 'BG321', 'BG322', 'BG323', 'BG324', 'BG325', 'BG33', 'BG331', 'BG332', 'BG333', 'BG334', 'BG34', 'BG341', 'BG342', 'BG343', 'BG344', 'BG4', 'BG41', 'BG411', 'BG412', 'BG413', 'BG414', 'BG415', 'BG42', 'BG421', 'BG422', 'BG423', 'BG424', 'BG425', 'CH', 'CH0', 'CH01', 'CH011', 'CH012', 'CH013', 'CH02', 'CH021', 'CH022', 'CH023', 'CH024', 'CH025', 'CH03', 'CH031', 'CH032', 'CH033', 'CH04', 'CH040', 'CH05', 'CH051', 'CH052', 'CH053', 'CH054', 'CH055', 'CH056', 'CH057', 'CH06', 'CH061', 'CH062', 'CH063', 'CH064', 'CH065', 'CH066', 'CH07', 'CH070', 'CZ', 'CZ0', 'CZ01', 'CZ010', 'CZ02', 'CZ020', 'CZ03', 'CZ031', 'CZ032', 'CZ04', 'CZ041', 'CZ042', 'CZ05', 'CZ051', 'CZ052', 'CZ053', 'CZ06', 'CZ063', 'CZ064', 'CZ07', 'CZ071', 'CZ072', 'CZ08', 'CZ080', 'DK', 'DK0', 'DK01', 'DK011', 'DK012', 'DK013', 'DK014', 'DK02', 'DK021', 'DK022', 'DK03', 'DK031', 'DK032', 'DK04', 'DK041', 'DK042', 'DK05', 'DK050', 'EE', 'EE0', 'EE00', 'EE001', 'EE004', 'EE006', 'EE007', 'EE008', 'ES', 'ES1', 'ES11', 'ES111', 'ES112', 'ES113', 'ES114', 'ES12', 'ES120', 'ES13', 'ES130', 'ES2', 'ES21', 'ES211', 'ES212', 'ES213', 'ES22', 'ES220', 'ES23', 'ES230', 'ES24', 'ES241', 'ES242', 'ES243', 'ES3', 'ES30', 'ES300', 'ES4', 'ES41', 'ES411', 'ES412', 'ES413', 'ES414', 'ES415', 'ES416', 'ES417', 'ES418', 'ES419', 'ES42', 'ES421', 'ES422', 'ES423', 'ES424', 'ES425', 'ES43', 'ES431', 'ES432', 'ES5', 'ES51', 'ES511', 'ES512', 'ES513', 'ES514', 'ES52', 'ES521', 'ES522', 'ES523', 'ES53', 'ES531', 'ES532', 'ES533', 'ES6', 'ES61', 'ES611', 'ES612', 'ES613', 'ES614', 'ES615', 'ES616', 'ES617', 'ES618', 'ES62', 'ES620', 'ES63', 'ES630', 'ES64', 'ES640', 'ES7', 'ES70', 'ES703', 'ES704', 'ES705', 'ES706', 'ES707', 'ES708', 'ES709', 'FI', 'FI1', 'FI19', 'FI193', 'FI194', 'FI195', 'FI196', 'FI197', 'FI1B', 'FI1B1', 'FI1C', 'FI1C1', 'FI1C2', 'FI1C3', 'FI1C4', 'FI1C5', 'FI1D', 'FI1D1', 'FI1D2', 'FI1D3', 'FI1D5', 'FI1D7', 'FI1D8', 'FI1D9', 'FI2', 'FI20', 'FI200', 'FR', 'FR1', 'FR10', 'FR101', 'FR102', 'FR103', 'FR104', 'FR105', 'FR106', 'FR107', 'FR108', 'FRB', 'FRB0', 'FRB01', 'FRB02', 'FRB03', 'FRB04', 'FRB05', 'FRB06', 'FRC', 'FRC1', 'FRC11', 'FRC12', 'FRC13', 'FRC14', 'FRC2', 'FRC21', 'FRC22', 'FRC23', 'FRC24', 'FRD', 'FRD1', 'FRD11', 'FRD12', 'FRD13', 'FRD2', 'FRD21', 'FRD22', 'FRE', 'FRE1', 'FRE11', 'FRE12', 'FRE2', 'FRE21', 'FRE22', 'FRE23', 'FRF', 'FRF1', 'FRF11', 'FRF12', 'FRF2', 'FRF21', 'FRF22', 'FRF23', 'FRF24', 'FRF3', 'FRF31', 'FRF32', 'FRF33', 'FRF34', 'FRG', 'FRG0', 'FRG01', 'FRG02', 'FRG03', 'FRG04', 'FRG05', 'FRH', 'FRH0', 'FRH01', 'FRH02', 'FRH03', 'FRH04', 'FRI', 'FRI1', 'FRI11', 'FRI12', 'FRI13', 'FRI14', 'FRI15', 'FRI2', 'FRI21', 'FRI22', 'FRI23', 'FRI3', 'FRI31', 'FRI32', 'FRI33', 'FRI34', 'FRJ', 'FRJ1', 'FRJ11', 'FRJ12', 'FRJ13', 'FRJ14', 'FRJ15', 'FRJ2', 'FRJ21', 'FRJ22', 'FRJ23', 'FRJ24', 'FRJ25', 'FRJ26', 'FRJ27', 'FRJ28', 'FRK', 'FRK1', 'FRK11', 'FRK12', 'FRK13', 'FRK14', 'FRK2', 'FRK21', 'FRK22', 'FRK23', 'FRK24', 'FRK25', 'FRK26', 'FRK27', 'FRK28', 'FRL', 'FRL0', 'FRL01', 'FRL02', 'FRL03', 'FRL04', 'FRL05', 'FRL06', 'FRM', 'FRM0', 'FRM01', 'FRM02', 'FRY', 'FRY1', 'FRY10', 'FRY2', 'FRY20', 'FRY3', 'FRY30', 'FRY4', 'FRY40', 'FRY5', 'FRY50', 'HR', 'HU', 'HU1', 'HU11', 'HU110', 'HU12', 'HU120', 'HU2', 'HU21', 'HU211', 'HU212', 'HU213', 'HU22', 'HU221', 'HU222', 'HU223', 'HU23', 'HU231', 'HU232', 'HU233', 'HU3', 'HU31', 'HU311', 'HU312', 'HU313', 'HU32', 'HU321', 'HU322', 'HU323', 'HU33', 'HU331', 'HU332', 'HU333', 'HUX', 'HUXX', 'HUXXX', 'IS', 'IS0', 'IS00', 'IS001', 'IS002', 'IT', 'ITC', 'ITC1', 'ITC11', 'ITC12', 'ITC13', 'ITC14', 'ITC15', 'ITC16', 'ITC17', 'ITC18', 'ITC2', 'ITC20', 'ITC3', 'ITC31', 'ITC32', 'ITC33', 'ITC34', 'ITC4', 'ITC41', 'ITC42', 'ITC43', 'ITC44', 'ITC46', 'ITC47', 'ITC48', 'ITC49', 'ITC4A', 'ITC4B', 'ITC4C', 'ITC4D', 'ITF', 'ITF1', 'ITF11', 'ITF12', 'ITF13', 'ITF14', 'ITF2', 'ITF21', 'ITF22', 'ITF3', 'ITF31', 'ITF32', 'ITF33', 'ITF34', 'ITF35', 'ITF4', 'ITF43', 'ITF44', 'ITF45', 'ITF46', 'ITF47', 'ITF48', 'ITF5', 'ITF51', 'ITF52', 'ITF6', 'ITF61', 'ITF62', 'ITF63', 'ITF64', 'ITF65', 'ITG', 'ITG1', 'ITG11', 'ITG12', 'ITG13', 'ITG14', 'ITG15', 'ITG16', 'ITG17', 'ITG18', 'ITG19', 'ITG2', 'ITG25', 'ITG26', 'ITG27', 'ITG28', 'ITG29', 'ITG2A', 'ITG2B', 'ITG2C', 'ITH', 'ITH1', 'ITH10', 'ITH2', 'ITH20', 'ITH3', 'ITH31', 'ITH32', 'ITH33', 'ITH34', 'ITH35', 'ITH36', 'ITH37', 'ITH4', 'ITH41', 'ITH42', 'ITH43', 'ITH44', 'ITH5', 'ITH51', 'ITH52', 'ITH53', 'ITH54', 'ITH55', 'ITH56', 'ITH57', 'ITH58', 'ITH59', 'ITI', 'ITI1', 'ITI11', 'ITI12', 'ITI13', 'ITI14', 'ITI15', 'ITI16', 'ITI17', 'ITI18', 'ITI19', 'ITI1A', 'ITI2', 'ITI21', 'ITI22', 'ITI3', 'ITI31', 'ITI32', 'ITI33', 'ITI34', 'ITI35', 'ITI4', 'ITI41', 'ITI42', 'ITI43', 'ITI44', 'ITI45', 'LI', 'LI0', 'LI00', 'LI000', 'LT', 'LT0', 'LT01', 'LT011', 'LT02', 'LT021', 'LT022', 'LT023', 'LT024', 'LT025', 'LT026', 'LT027', 'LT028', 'LT029', 'LU', 'LU0', 'LU00', 'LU000', 'LV', 'LV0', 'LV00', 'LV003', 'LV005', 'LV006', 'LV007', 'LV008', 'LV009', 'LVX', 'LVXX', 'LVXXX', 'ME', 'ME0', 'ME00', 'ME000', 'NL', 'NL1', 'NL11', 'NL111', 'NL112', 'NL113', 'NL12', 'NL124', 'NL125', 'NL126', 'NL13', 'NL131', 'NL132', 'NL133', 'NL2', 'NL21', 'NL211', 'NL212', 'NL213', 'NL22', 'NL221', 'NL224', 'NL225', 'NL226', 'NL23', 'NL230', 'NL3', 'NL31', 'NL310', 'NL32', 'NL321', 'NL323', 'NL324', 'NL325', 'NL327', 'NL328', 'NL329', 'NL33', 'NL332', 'NL333', 'NL337', 'NL33A', 'NL33B', 'NL33C', 'NL34', 'NL341', 'NL342', 'NL4', 'NL41', 'NL411', 'NL412', 'NL413', 'NL414', 'NL42', 'NL421', 'NL422', 'NL423', 'NO', 'NO0', 'NO01', 'NO011', 'NO012', 'NO02', 'NO021', 'NO022', 'NO03', 'NO031', 'NO032', 'NO033', 'NO034', 'NO04', 'NO041', 'NO042', 'NO043', 'NO05', 'NO051', 'NO052', 'NO053', 'NO06', 'NO060', 'NO061', 'NO062', 'NO07', 'NO071', 'NO072', 'NO073', 'PT', 'PT1', 'PT11', 'PT111', 'PT112', 'PT119', 'PT11A', 'PT11B', 'PT11C', 'PT11D', 'PT11E', 'PT15', 'PT150', 'PT16', 'PT16B', 'PT16D', 'PT16E', 'PT16F', 'PT16G', 'PT16H', 'PT16I', 'PT16J', 'PT17', 'PT170', 'PT18', 'PT181', 'PT184', 'PT185', 'PT186', 'PT187', 'PT2', 'PT20', 'PT200', 'PT3', 'PT30', 'PT300', 'PTX', 'PTXX', 'PTXXX', 'RS', 'RS1', 'RS11', 'RS110', 'RS12', 'RS121', 'RS122', 'RS123', 'RS124', 'RS125', 'RS126', 'RS127', 'RS2', 'RS21', 'RS211', 'RS212', 'RS213', 'RS214', 'RS215', 'RS216', 'RS217', 'RS218', 'RS22', 'RS221', 'RS222', 'RS223', 'RS224', 'RS225', 'RS226', 'RS227', 'RS228', 'RS229', 'SE', 'SE1', 'SE11', 'SE110', 'SE12', 'SE121', 'SE122', 'SE123', 'SE124', 'SE125', 'SE2', 'SE21', 'SE211', 'SE212', 'SE213', 'SE214', 'SE22', 'SE221', 'SE224', 'SE23', 'SE231', 'SE232', 'SE3', 'SE31', 'SE311', 'SE312', 'SE313', 'SE32', 'SE321', 'SE322', 'SE33', 'SE331', 'SE332', 'SI', 'SK', 'SK0', 'SK01', 'SK010', 'SK02', 'SK021', 'SK022', 'SK023', 'SK03', 'SK031', 'SK032', 'SK04', 'SK041', 'SK042', 'UK', 'UKC', 'UKC1', 'UKC11', 'UKC12', 'UKC13', 'UKC14', 'UKC2', 'UKC21', 'UKC22', 'UKC23', 'UKD', 'UKD1', 'UKD11', 'UKD12', 'UKD3', 'UKD33', 'UKD34', 'UKD35', 'UKD36', 'UKD37', 'UKD4', 'UKD41', 'UKD42', 'UKD44', 'UKD45', 'UKD46', 'UKD47', 'UKD6', 'UKD61', 'UKD62', 'UKD63', 'UKD7', 'UKD71', 'UKD72', 'UKD73', 'UKD74', 'UKE', 'UKE1', 'UKE11', 'UKE12', 'UKE13', 'UKE2', 'UKE21', 'UKE22', 'UKE3', 'UKE31', 'UKE32', 'UKE4', 'UKE41', 'UKE42', 'UKE44', 'UKE45', 'UKF', 'UKF1', 'UKF11', 'UKF12', 'UKF13', 'UKF14', 'UKF15', 'UKF16', 'UKF2', 'UKF21', 'UKF22', 'UKF24', 'UKF25', 'UKF3', 'UKF30', 'UKG', 'UKG1', 'UKG11', 'UKG12', 'UKG13', 'UKG2', 'UKG21', 'UKG22', 'UKG23', 'UKG24', 'UKG3', 'UKG31', 'UKG32', 'UKG33', 'UKG36', 'UKG37', 'UKG38', 'UKG39', 'UKH', 'UKH1', 'UKH11', 'UKH12', 'UKH14', 'UKH15', 'UKH16', 'UKH17', 'UKH2', 'UKH21', 'UKH23', 'UKH24', 'UKH25', 'UKH3', 'UKH31', 'UKH32', 'UKH34', 'UKH35', 'UKH36', 'UKH37', 'UKI', 'UKI3', 'UKI31', 'UKI32', 'UKI33', 'UKI34', 'UKI4', 'UKI41', 'UKI42', 'UKI43', 'UKI44', 'UKI45', 'UKI5', 'UKI51', 'UKI52', 'UKI53', 'UKI54', 'UKI6', 'UKI61', 'UKI62', 'UKI63', 'UKI7', 'UKI71', 'UKI72', 'UKI73', 'UKI74', 'UKI75', 'UKJ', 'UKJ1', 'UKJ11', 'UKJ12', 'UKJ13', 'UKJ14', 'UKJ2', 'UKJ21', 'UKJ22', 'UKJ25', 'UKJ26', 'UKJ27', 'UKJ28', 'UKJ3', 'UKJ31', 'UKJ32', 'UKJ34', 'UKJ35', 'UKJ36', 'UKJ37', 'UKJ4', 'UKJ41', 'UKJ43', 'UKJ44', 'UKJ45', 'UKJ46', 'UKK', 'UKK1', 'UKK11', 'UKK12', 'UKK13', 'UKK14', 'UKK15', 'UKK2', 'UKK21', 'UKK22', 'UKK23', 'UKK3', 'UKK30', 'UKK4', 'UKK41', 'UKK42', 'UKK43', 'UKL', 'UKL1', 'UKL11', 'UKL12', 'UKL13', 'UKL14', 'UKL15', 'UKL16', 'UKL17', 'UKL18', 'UKL2', 'UKL21', 'UKL22', 'UKL23', 'UKL24', 'UKM', 'UKN', 'NLXXX', 'DE']

We need to further clean the data to get rid of the flags (note: this is actually important, and should be communicated with the data, however we omit it for this exercise). We also reformat missing values (:) into NaNs (flag_nan_clean):

In [19]:
def flag_nan_clean(df):
    def filter_cell(c):
        cstr = str(c)
        return np.nan if cstr.strip()==':' else (np.float(c.split(' ')[0]) if re.search(' ', cstr) else c)
    return df.applymap(filter_cell)

death_data = flag_nan_clean(death_data) # Zzzzzzz...

death_data.head(5)
Out[19]:
2020W25 2020W24 2020W23 2020W22 2020W21 2020W20 2020W19 2020W18 2020W17 2020W16 ... 2016W06 2016W05 2016W04 2016W03 2016W02 2016W01 unit sex age geo
0 NaN NaN NaN 729.0 724.0 735.0 748.0 759.0 791.0 903.0 ... 882.0 836.0 857.0 814.0 823.0 853.0 NR F TOTAL AT
1 NaN NaN NaN 343.0 328.0 328.0 355.0 355.0 350.0 406.0 ... 402.0 392.0 395.0 347.0 380.0 387.0 NR F TOTAL AT1
2 NaN NaN NaN 26.0 28.0 29.0 25.0 41.0 25.0 42.0 ... 28.0 31.0 42.0 26.0 38.0 32.0 NR F TOTAL AT11
3 NaN NaN NaN 162.0 152.0 164.0 165.0 157.0 174.0 175.0 ... 206.0 181.0 178.0 156.0 167.0 160.0 NR F TOTAL AT12
4 NaN NaN NaN 155.0 148.0 135.0 165.0 157.0 151.0 189.0 ... 168.0 180.0 175.0 165.0 175.0 195.0 NR F TOTAL AT13

5 rows × 237 columns

Last, we cast the data:

In [20]:
death_data[WEEK_COLS] = death_data[WEEK_COLS].astype(float)

death_data.head(5)
Out[20]:
2020W25 2020W24 2020W23 2020W22 2020W21 2020W20 2020W19 2020W18 2020W17 2020W16 ... 2016W06 2016W05 2016W04 2016W03 2016W02 2016W01 unit sex age geo
0 NaN NaN NaN 729.0 724.0 735.0 748.0 759.0 791.0 903.0 ... 882.0 836.0 857.0 814.0 823.0 853.0 NR F TOTAL AT
1 NaN NaN NaN 343.0 328.0 328.0 355.0 355.0 350.0 406.0 ... 402.0 392.0 395.0 347.0 380.0 387.0 NR F TOTAL AT1
2 NaN NaN NaN 26.0 28.0 29.0 25.0 41.0 25.0 42.0 ... 28.0 31.0 42.0 26.0 38.0 32.0 NR F TOTAL AT11
3 NaN NaN NaN 162.0 152.0 164.0 165.0 157.0 174.0 175.0 ... 206.0 181.0 178.0 156.0 167.0 160.0 NR F TOTAL AT12
4 NaN NaN NaN 155.0 148.0 135.0 165.0 157.0 151.0 189.0 ... 168.0 180.0 175.0 165.0 175.0 195.0 NR F TOTAL AT13

5 rows × 237 columns

Data exploration

For reminder:

In [21]:
print('Number of weekly observations: %s' % len(WEEK_COLS))
print("Years present in the dataset: \033[1m%s\033[0m" % YEARS)
print("Current year: \033[1m%s\033[0m" % YCUR)
print("Weeks available in current year: \033[1m%s\033[0m" % WEEKS)
Number of weekly observations: 233
Years present in the dataset: [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
Current year: 2020
Weeks available in current year: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

We first have a look at which data/countries (DATA_NUTS_ID) are actually made available throughout the dataset death_date (note that this actually does not specify whether the data are NaN or not...):

In [22]:
KEY = 'NUTS_ID'

NUTS_ID = dict.fromkeys(LEVELS)
CTRY_ID = dict.fromkeys(LEVELS) 

for l in LEVELS:
    NUTS_ID.update({l: nuts_data[l][KEY].unique().tolist()})
    CTRY_ID.update({l: list(set([_id[:2] for _id in NUTS_ID[l]]))})
    
DATA_NUTS_ID = dict.fromkeys(LEVELS) 
all_id = death_data['geo'].unique().tolist()

print('Data are availalble:') 
for l in LEVELS:
    DATA_NUTS_ID.update({l: list(set([_id[:2] for _id in all_id if len(_id)==l+2]))})
    print('* NUTS level %s: \033[1m%s\033[0m' 
          % (l, DATA_NUTS_ID[l]))
Data are availalble:
* NUTS level 0: ['NO', 'NL', 'IS', 'HR', 'IT', 'SK', 'RS', 'LI', 'AT', 'UK', 'FI', 'LT', 'HU', 'CH', 'ME', 'BE', 'LU', 'ES', 'DE', 'SE', 'SI', 'LV', 'PT', 'FR', 'DK', 'CZ', 'BG', 'EE']
* NUTS level 1: ['NO', 'NL', 'IS', 'IT', 'SK', 'RS', 'LI', 'AT', 'UK', 'FI', 'LT', 'HU', 'CH', 'ME', 'BE', 'LU', 'ES', 'SE', 'LV', 'PT', 'FR', 'DK', 'CZ', 'BG', 'EE']
* NUTS level 2: ['NO', 'NL', 'IS', 'IT', 'SK', 'RS', 'LI', 'AT', 'UK', 'FI', 'LT', 'HU', 'CH', 'ME', 'BE', 'LU', 'ES', 'SE', 'LV', 'PT', 'FR', 'DK', 'CZ', 'BG', 'EE']
* NUTS level 3: ['NO', 'NL', 'IS', 'IT', 'SK', 'RS', 'LI', 'UK', 'FI', 'LT', 'HU', 'CH', 'ME', 'BE', 'LU', 'ES', 'SE', 'LV', 'PT', 'FR', 'DK', 'CZ', 'BG', 'EE']

We can see for any given level, which country/datasets are missing:

In [23]:
MISSING_CTRY_ID = dict.fromkeys(LEVELS) 

print('Data are NOT availalble:') 
for l in LEVELS:
    MISSING_CTRY_ID.update({l: list(set(CTRY_ID[l]).difference(set(DATA_NUTS_ID[l])))})
    print('* NUTS level %s: \033[1m%s\033[0m' 
          % (l, MISSING_CTRY_ID[l]))
Data are NOT availalble:
* NUTS level 0: ['IE', 'AL', 'CY', 'EL', 'TR', 'PL', 'RO', 'MK', 'MT']
* NUTS level 1: ['SI', 'IE', 'AL', 'MK', 'CY', 'EL', 'TR', 'PL', 'DE', 'RO', 'HR', 'MT']
* NUTS level 2: ['SI', 'IE', 'AL', 'MK', 'CY', 'EL', 'TR', 'PL', 'DE', 'RO', 'HR', 'MT']
* NUTS level 3: ['SI', 'IE', 'AL', 'MK', 'CY', 'EL', 'TR', 'MT', 'PL', 'DE', 'RO', 'HR', 'AT']

We can also build (using the method time_series) the temporal evolution of the deaths rate for a given region (REG_ID) and store it in a time series (ts):

In [24]:
def time_series(df, geo, age = "TOTAL", sex = "T", week = 1, years = [2020,]):
    try:
        assert isinstance(df, (pd.Series,pd.DataFrame))
    except:
        raise IOError('wrong format for input dataset')
    if np.isscalar(week): 
        week = [week,]
    ts = pd.DataFrame(columns = [y for y in years], 
                      index = pd.Index(week))
    extract = (
        df.loc[(df["geo"]==geo) & (df["age"]==age) & (df["sex"]==sex)]
        .drop(columns=["geo","age","sex"])
    )
    for y in years:
        ts[y] = pd.Series(extract[["%sW%02d" % (y,w) for w in week]].values[0])
    return ts

REG_NAME = 'Lombardia'
REG_ID = nuts_data[2].loc[nuts_data[2]['NUTS_NAME']==REG_NAME, 'NUTS_ID'].values[0]

AGE, SEX = "TOTAL", "T"
ts = time_series(death_data, REG_ID, age = AGE, sex = SEX, 
                 week = WEEKS, years = RYEARS)

try: 
    assert ts is not None
except:
    print("\033[1m!!! No data available for the considered region !!!\033[0m")
else:
    print('Data available for unit %s (age: %s, sex: %s)' % (REG_ID, AGE, SEX))
ts is not None and ts.head(5)
Data available for unit ITC4 (age: TOTAL, sex: T)
Out[24]:
2016 2017 2018 2019 2020
1 1994.0 2873.0 2510.0 2242.0 2029.0
2 2094.0 2684.0 2428.0 2284.0 2104.0
3 1969.0 2549.0 2271.0 2306.0 2123.0
4 1922.0 2328.0 2297.0 2411.0 2051.0
5 1979.0 2254.0 2061.0 2303.0 2113.0

that we plot (using plot_oneversus) for quick exploration:

In [25]:
def plot_oneversus(df, index = None, one = None, versus = None,  
                   fig=None, ax=None, shp = (1,1), dpi=_DPI_,
                   xlabel='', ylabel='', title = '', xrottick = False, legend = None,                 
                   grid = False, suptitle = '', locator = None, formatter = None):    
    try:
        assert isinstance(df, (pd.Series,pd.DataFrame))
    except:
        raise IOError('wrong format for input timeseries')
    if ax is None:
        if shp in (None,[],()): shp = (1,1)
        if dpi is None:     fig, pax = mplt.subplots(*shp, constrained_layout=True)
        else:               fig, pax = mplt.subplots(*shp, dpi=dpi, constrained_layout=True)
        if isinstance(pax,np.ndarray):
            if pax.ndim == 1:    ax_ = pax[0]
            else:               ax_ = pax[0,0]
        else:
            ax_ = pax
    else:
        ax_, pax = ax, None
    if index is None:
        index = dat.index
    if one is not None:
        ax_.plot(df.loc[index,one], ls='-', lw=0.6, c='r', 
                 marker='v', markersize=6, fillstyle='none')
        next(ax_._get_lines.prop_cycler)
    if versus is None:
        versus = df.columns
        try:    versus.remote(one)
        except: pass
    ax_.plot(df.loc[index,versus], ls='None', marker='o', fillstyle='none')
    ax_.set_xlabel(xlabel), ax_.set_ylabel(ylabel)
    if grid is not False:       ax_.grid(linewidth=grid)
    if xrottick is not False:   ax_.tick_params(axis ='x', labelrotation=xrottick)
    if locator is not None:     ax_.xaxis.set_major_locator(locator)
    if formatter is not None:   ax_.xaxis.set_major_formatter(formatter)
    if legend is None:
        legend = [one]
        legend.extend(versus)
    ax_.legend(legend, fontsize='small')
    if title not in ('',None):  ax_.set_title(title,  fontsize='medium')
    if suptitle not in ('',None):       
        fig.suptitle(suptitle,  fontsize='medium')
    if pax is not None:
        return fig, pax
    
plot_oneversus(ts, index=WEEKS, one=YCUR, versus=CYEARS, dpi=100, grid = 0.5,
               suptitle="Monthly deaths figure in %s over the past %s years (age: %s, sex: %s)" 
                       % (REG_NAME, max(RYEARS)-min(RYEARS), AGE, SEX)
              )
Out[25]:
(<Figure size 600x400 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x145d2a690>)

Excess rate representation

Let's further select an extended period of interest (WEEKRES as a list of weeks index):

In [26]:
DATERES, WEEKRES = None, list(range(9,18)) # select what you are interested
if WEEKRES is None:
    try:
        WEEKRES = DATERES.isocalendar()[1]
    except:
        WEEKRES = [15]

The method estimate_excess_rate implements the death excess rate of deaths (method ) over a (couple of) weeks in the current year (YCUR) as the rate of increase w.r.t. the reference mean (BASE, although max or min could also be used) of deaths calculated over the previous years (from YSTART to YCUR excluded). w_death_data). The death excess rate is stored in the INC column of the output dataset (w_death_data):

In [27]:
def estimate_excess_rate(df, inc = "rinc", agg = "mean", week=1, ystart=2015, year=2020):
    try:
        assert isinstance(df,(pd.Series,pd.DataFrame))
    except:
        raise IOError('wrong format for input dataset')
    if np.isscalar(week): 
        week = [week,]
    col_drop = [col for col in death_data.columns 
                if re.search('W',col) is not None and int(col.split('W')[1]) not in week]
    data = df.drop(columns = col_drop)
    years = range(ystart, year)
    cols = [col for col in df.columns                               \
            if any([col.strip().endswith('W%02d' % w) for w in week]) \
            and int(col.split('W')[0]) in years]
    sagg = df[cols].agg(agg, axis='columns', skipna=False) # we cant ignore missing values!
    if len(week)>1:
        cols = [col for col in df.columns                               \
                if any([col.strip().endswith('W%02d' % w) for w in week]) \
                and col.startswith('%sW' % str(year))]
        syear = df[cols].mean(axis='columns', skipna=False) # take the mean 
    else:
        syear = df['%sW%02d' % (str(year),week[0])]
    data[inc] = 100 * syear.sub(sagg).div(sagg)    
    return data

BASE = "mean" # "max" # "min"
INC = 'rinc'
w_death_data = estimate_excess_rate(death_data, inc = INC, agg = BASE, 
                                  week = WEEKRES, ystart = YSTART, year = YCUR)

w_death_data.head()
Out[27]:
2020W17 2020W16 2020W15 2020W14 2020W13 2020W12 2020W11 2020W10 2020W09 2019W17 ... 2016W13 2016W12 2016W11 2016W10 2016W09 unit sex age geo rinc
0 791.0 903.0 910.0 924.0 897.0 927.0 856.0 887.0 905.0 785.0 ... 833.0 780.0 802.0 843.0 888.0 NR F TOTAL AT 5.627991
1 350.0 406.0 407.0 387.0 402.0 411.0 376.0 390.0 390.0 343.0 ... 379.0 347.0 384.0 380.0 396.0 NR F TOTAL AT1 2.534965
2 25.0 42.0 29.0 45.0 46.0 37.0 28.0 46.0 36.0 36.0 ... 35.0 36.0 46.0 32.0 38.0 NR F TOTAL AT11 9.958848
3 174.0 175.0 189.0 183.0 160.0 180.0 172.0 184.0 180.0 159.0 ... 185.0 148.0 183.0 171.0 175.0 NR F TOTAL AT12 -0.327664
4 151.0 189.0 189.0 159.0 196.0 194.0 176.0 160.0 174.0 148.0 ... 159.0 163.0 155.0 177.0 183.0 NR F TOTAL AT13 4.062910

5 rows × 50 columns

Let's generate (using the method build_table) the statistics for the entire population during the specific period (s_death_data):

In [28]:
def build_table(df, inc = 'rinc', key='NUTS_ID', age = "TOTAL", sex = "T", miss_id = {}):
    try:
        assert isinstance(df, (pd.Series,pd.DataFrame))
    except:
        raise IOError('wrong format for input dataset')
    col_drop = list(set(df.columns).difference(set(['geo',inc])))
    data_miss = pd.DataFrame({key: miss_id[0].copy(), inc: np.nan})
    return pd.concat([(
        df.loc[(df["age"]==age) & (df["sex"]==sex)]
        .copy()
        .drop(columns=col_drop)
        #.dropna(axis='index', subset=[inc])
        .rename(columns={'geo':key})
        ),
        data_miss
        ], 
        axis=0, ignore_index=True
    )

s_death_data = build_table(w_death_data, inc = INC, key=KEY, 
                           age = AGE, sex = SEX, 
                           miss_id=MISSING_CTRY_ID)

s_death_data.head(-10)
Out[28]:
NUTS_ID rinc
0 AT 8.559312
1 AT1 7.704735
2 AT11 11.530398
3 AT12 5.654882
4 AT13 9.145854
... ... ...
1059 UKL21 NaN
1060 UKL22 NaN
1061 UKL23 NaN
1062 UKL24 NaN
1063 UKM NaN

1064 rows × 2 columns

We assume completeness here, that is when an administrative unit $A$ is available at level $l$, all other units belonging to the same region than $A$ at previous level $l-1$ are also available at level $l$. We merge the information from both datasets nuts_data (where geometries are stored) and s_death_data (where excess rate INC is calculated) into a single dataset nuts_death_data:

In [29]:
def build_unit_level(nuts, death, levels = [0, 1, 2, 3], key='NUTS_ID', how='inner'): 
    try:
        assert (isinstance(nuts, dict) 
                and all([isinstance(df,(gpd.GeoSeries, gpd.GeoDataFrame)) for df in nuts.values()])
                and set(nuts.keys()).difference(set(levels)) == set())
    except:
        raise IOError('wrong format for input data dictionary')
    data = dict.fromkeys(levels)
    for l in levels:
        temp = (nuts[l]
                .merge(death, on=key, how=how)
                #.dropna(axis='index', subset=[inc]) 
               )
        temp = temp[~(temp.geometry.is_empty | temp.geometry.isna())]
        temp.geometry = temp.geometry.buffer(0) # deal with invalid geometry
        data.update({l: temp})
    return data

nuts_death_data = build_unit_level(nuts_data, s_death_data, 
                                   levels = LEVELS, key = KEY, how='right') 

nuts_death_data[3].head()
Out[29]:
id COAST_TYPE MOUNT_TYPE CNTR_CODE FID NUTS_ID NUTS_NAME LEVL_CODE URBN_TYPE geometry rinc
0 CZ052 3.0 4.0 CZ CZ052 CZ052 Královéhradecký kraj 3.0 2.0 POLYGON ((4752268.000 3079151.000, 4768791.000... -3.750885
1 CZ053 3.0 4.0 CZ CZ053 CZ053 Pardubický kraj 3.0 3.0 POLYGON ((4812157.000 2965656.000, 4782782.000... -5.853175
2 CZ063 3.0 4.0 CZ CZ063 CZ063 Kraj Vysočina 3.0 3.0 POLYGON ((4782782.000 2960975.000, 4775982.000... 7.531493
3 CZ064 3.0 4.0 CZ CZ064 CZ064 Jihomoravský kraj 3.0 2.0 POLYGON ((4841158.000 2932184.000, 4851728.000... -2.215154
4 CZ071 3.0 2.0 CZ CZ071 CZ071 Olomoucký kraj 3.0 2.0 POLYGON ((4849854.000 3042645.000, 4835136.000... 0.198965

However, we know that some countries are represented at higher level only (this also includes those small countries which do not have NUTS representation in the highest level, say Luxembourg for instance). In practice, we transfer (in the method build_unit_level) the information available about an administrative unit $A$ at level $l$ to the highest $l+1$. In doing so, we shall also pay special attention to the list of countries for which data are not available in the specific period considered here besides the missing ones, i.e. those not listed in the table (update of nuts_death_data):

In [30]:
def propagate_unit_level(data, levels = [0, 1, 2, 3], inc = 'rinc', key = 'NUTS_ID', miss_id = {}):
    try:
        assert (isinstance(data, dict) 
                and all([isinstance(df,(gpd.GeoSeries, gpd.GeoDataFrame)) for df in data.values()])
                and set(data.keys()).difference(set(levels)) == set())
    except:
        raise IOError('wrong format for input data dictionary')
    nan_ctry_id = copy.deepcopy(miss_id) or dict.fromkeys(levels) 
    for l in range(len(levels)):
        level = levels[l]
        nan_ctry_id[l].extend([_id[:2] 
                               for _id in data[level].loc[data[level][inc].isnull(), key].unique().tolist() 
                               if len(_id)==l+2])
        nan_ctry_id[l] = list(set(nan_ctry_id[l]))
    for ctry in nan_ctry_id[0]:
        level0 = levels[0]
        data_miss = data[level0][data[level0][key].str.startswith(ctry)].copy()
        data_miss.geometry = data_miss.geometry.buffer(0)
        for level in levels[1:]:
            data[level] = pd.concat([data[level].loc[~(data[level][key].str.startswith(ctry))], 
                                     data_miss
                                    ], 
                                    axis=0, ignore_index=True
                                   )
    for l in range(1,len(levels)):
        for ctry in nan_ctry_id[l]:
            if ctry in nan_ctry_id[l-1]:
                continue
            levelp = levels[l-1]
            data_miss = data[levelp].loc[data[levelp][key].str.startswith(ctry)].copy()
            data_miss.geometry = data_miss.geometry.buffer(0)
            for l_ in range(l,len(levels)):
                level_ = levels[l_]
                data[level_] = pd.concat([data[level_].loc[~(data[level_][key].str.startswith(ctry))], 
                                          data_miss 
                                         ], 
                                         axis=0, ignore_index=True
                                        )
    return data

nuts_death_data = propagate_unit_level(nuts_death_data, miss_id = MISSING_CTRY_ID,
                                       levels = LEVELS, inc = INC, key = KEY)
nuts_death_data[3].head(5)
Out[30]:
id COAST_TYPE MOUNT_TYPE CNTR_CODE FID NUTS_ID NUTS_NAME LEVL_CODE URBN_TYPE geometry rinc
0 CZ052 3.0 4.0 CZ CZ052 CZ052 Královéhradecký kraj 3.0 2.0 POLYGON ((4752268.000 3079151.000, 4768791.000... -3.750885
1 CZ053 3.0 4.0 CZ CZ053 CZ053 Pardubický kraj 3.0 3.0 POLYGON ((4812157.000 2965656.000, 4782782.000... -5.853175
2 CZ063 3.0 4.0 CZ CZ063 CZ063 Kraj Vysočina 3.0 3.0 POLYGON ((4782782.000 2960975.000, 4775982.000... 7.531493
3 CZ064 3.0 4.0 CZ CZ064 CZ064 Jihomoravský kraj 3.0 2.0 POLYGON ((4841158.000 2932184.000, 4851728.000... -2.215154
4 CZ071 3.0 2.0 CZ CZ071 CZ071 Olomoucký kraj 3.0 2.0 POLYGON ((4849854.000 3042645.000, 4835136.000... 0.198965

For display reason, we define a very coarse bounding box (EUclip, see for instance this visualisation) that is embedded in the NUTS bounding box (NUTSarea) so as to render the data in a smaller region:

In [31]:
EUmask = geometry.Polygon([#w: -25.5, s:35, e:30.2, n:71.3 in EPSG:4326 system
                         (-25.5, 30.2),
                         (35,  30.2),
                         (35,  71.3),
                         (-25.5, 71.3)
])
EUclip = gpd.GeoDataFrame(index=[0], geometry=[EUmask], crs='EPSG:4326').to_crs('EPSG:3035')

NUTSarea = gpd.GeoDataFrame(index=[0], geometry=[nuts_data[0].unary_union], crs='EPSG:3035')


f, ax = mplt.subplots(1, figsize=(14, 10))
EUclip.boundary.plot(ax=ax, color='r', label='clip area')
NUTSarea.boundary.plot(ax=ax, color='b', label='NUTS area')
ax.axes.get_xaxis().set_visible(False); ax.axes.get_yaxis().set_visible(False)
ax.set_title('Clipped domain of representation')
ax.legend()
mplt.show()

Following, we use the clip area to crop and represent (on a continuous scale) the results available on continental Europe only (crop) distributed at NUTS level LEVEL. Missing countries are also represented:

In [32]:
LEVEL = 3
crop = gpd.clip(nuts_death_data[LEVEL], EUclip)
# crop = nuts_death_data[LEVEL].intersection(EUclip.unary_union) 

f, ax = mplt.subplots(1, figsize=(20, 16))
crop.plot(column='rinc', ax=ax, cmap='hot', legend=True, # cmap = 'OrRd'
          missing_kwds={ "color": "lightgrey", "alpha": 0.2, "edgecolor": "grey", "hatch": "///"}
         )
ax.set_axis_off()
ax.set_title('Excess mortality (in %%) during weeks %s to %s, %s (age: %s, sex: %s) - NUTS level %s' 
             % (min(WEEKRES), max(WEEKRES), YCUR, AGE, SEX, LEVEL))
mplt.show()

For information, the NUTS data (at various levels, depending on the countries' availability at the selected representation level LEVEL) represented in the map above are listed in INREGIONS:

In [33]:
INREGIONS = s_death_data['NUTS_ID'].unique().tolist()

print("NUTS regions actually available for this selection: \033[1m%s\033[0m" % INREGIONS)
NUTS regions actually available for this selection: ['AT', 'AT1', 'AT11', 'AT12', 'AT13', 'AT2', 'AT21', 'AT22', 'AT3', 'AT31', 'AT32', 'AT33', 'AT34', 'BE', 'BE1', 'BE10', 'BE100', 'BE2', 'BE21', 'BE211', 'BE212', 'BE213', 'BE22', 'BE221', 'BE222', 'BE223', 'BE23', 'BE231', 'BE232', 'BE233', 'BE234', 'BE235', 'BE236', 'BE24', 'BE241', 'BE242', 'BE25', 'BE251', 'BE252', 'BE253', 'BE254', 'BE255', 'BE256', 'BE257', 'BE258', 'BE3', 'BE31', 'BE310', 'BE32', 'BE321', 'BE322', 'BE323', 'BE324', 'BE325', 'BE326', 'BE327', 'BE33', 'BE331', 'BE332', 'BE334', 'BE335', 'BE336', 'BE34', 'BE341', 'BE342', 'BE343', 'BE344', 'BE345', 'BE35', 'BE351', 'BE352', 'BE353', 'BG', 'BG3', 'BG31', 'BG311', 'BG312', 'BG313', 'BG314', 'BG315', 'BG32', 'BG321', 'BG322', 'BG323', 'BG324', 'BG325', 'BG33', 'BG331', 'BG332', 'BG333', 'BG334', 'BG34', 'BG341', 'BG342', 'BG343', 'BG344', 'BG4', 'BG41', 'BG411', 'BG412', 'BG413', 'BG414', 'BG415', 'BG42', 'BG421', 'BG422', 'BG423', 'BG424', 'BG425', 'CH', 'CH0', 'CH01', 'CH011', 'CH012', 'CH013', 'CH02', 'CH021', 'CH022', 'CH023', 'CH024', 'CH025', 'CH03', 'CH031', 'CH032', 'CH033', 'CH04', 'CH040', 'CH05', 'CH051', 'CH052', 'CH053', 'CH054', 'CH055', 'CH056', 'CH057', 'CH06', 'CH061', 'CH062', 'CH063', 'CH064', 'CH065', 'CH066', 'CH07', 'CH070', 'CZ', 'CZ0', 'CZ01', 'CZ010', 'CZ02', 'CZ020', 'CZ03', 'CZ031', 'CZ032', 'CZ04', 'CZ041', 'CZ042', 'CZ05', 'CZ051', 'CZ052', 'CZ053', 'CZ06', 'CZ063', 'CZ064', 'CZ07', 'CZ071', 'CZ072', 'CZ08', 'CZ080', 'DE', 'DK', 'DK0', 'DK01', 'DK011', 'DK012', 'DK013', 'DK014', 'DK02', 'DK021', 'DK022', 'DK03', 'DK031', 'DK032', 'DK04', 'DK041', 'DK042', 'DK05', 'DK050', 'EE', 'EE0', 'EE00', 'EE001', 'EE004', 'EE006', 'EE007', 'EE008', 'ES', 'ES1', 'ES11', 'ES111', 'ES112', 'ES113', 'ES114', 'ES12', 'ES120', 'ES13', 'ES130', 'ES2', 'ES21', 'ES211', 'ES212', 'ES213', 'ES22', 'ES220', 'ES23', 'ES230', 'ES24', 'ES241', 'ES242', 'ES243', 'ES3', 'ES30', 'ES300', 'ES4', 'ES41', 'ES411', 'ES412', 'ES413', 'ES414', 'ES415', 'ES416', 'ES417', 'ES418', 'ES419', 'ES42', 'ES421', 'ES422', 'ES423', 'ES424', 'ES425', 'ES43', 'ES431', 'ES432', 'ES5', 'ES51', 'ES511', 'ES512', 'ES513', 'ES514', 'ES52', 'ES521', 'ES522', 'ES523', 'ES53', 'ES531', 'ES532', 'ES533', 'ES6', 'ES61', 'ES611', 'ES612', 'ES613', 'ES614', 'ES615', 'ES616', 'ES617', 'ES618', 'ES62', 'ES620', 'ES63', 'ES630', 'ES64', 'ES640', 'ES7', 'ES70', 'ES703', 'ES704', 'ES705', 'ES706', 'ES707', 'ES708', 'ES709', 'FI', 'FI1', 'FI19', 'FI193', 'FI194', 'FI195', 'FI196', 'FI197', 'FI1B', 'FI1B1', 'FI1C', 'FI1C1', 'FI1C2', 'FI1C3', 'FI1C4', 'FI1C5', 'FI1D', 'FI1D1', 'FI1D2', 'FI1D3', 'FI1D5', 'FI1D7', 'FI1D8', 'FI1D9', 'FI2', 'FI20', 'FI200', 'FR', 'FR1', 'FR10', 'FR101', 'FR102', 'FR103', 'FR104', 'FR105', 'FR106', 'FR107', 'FR108', 'FRB', 'FRB0', 'FRB01', 'FRB02', 'FRB03', 'FRB04', 'FRB05', 'FRB06', 'FRC', 'FRC1', 'FRC11', 'FRC12', 'FRC13', 'FRC14', 'FRC2', 'FRC21', 'FRC22', 'FRC23', 'FRC24', 'FRD', 'FRD1', 'FRD11', 'FRD12', 'FRD13', 'FRD2', 'FRD21', 'FRD22', 'FRE', 'FRE1', 'FRE11', 'FRE12', 'FRE2', 'FRE21', 'FRE22', 'FRE23', 'FRF', 'FRF1', 'FRF11', 'FRF12', 'FRF2', 'FRF21', 'FRF22', 'FRF23', 'FRF24', 'FRF3', 'FRF31', 'FRF32', 'FRF33', 'FRF34', 'FRG', 'FRG0', 'FRG01', 'FRG02', 'FRG03', 'FRG04', 'FRG05', 'FRH', 'FRH0', 'FRH01', 'FRH02', 'FRH03', 'FRH04', 'FRI', 'FRI1', 'FRI11', 'FRI12', 'FRI13', 'FRI14', 'FRI15', 'FRI2', 'FRI21', 'FRI22', 'FRI23', 'FRI3', 'FRI31', 'FRI32', 'FRI33', 'FRI34', 'FRJ', 'FRJ1', 'FRJ11', 'FRJ12', 'FRJ13', 'FRJ14', 'FRJ15', 'FRJ2', 'FRJ21', 'FRJ22', 'FRJ23', 'FRJ24', 'FRJ25', 'FRJ26', 'FRJ27', 'FRJ28', 'FRK', 'FRK1', 'FRK11', 'FRK12', 'FRK13', 'FRK14', 'FRK2', 'FRK21', 'FRK22', 'FRK23', 'FRK24', 'FRK25', 'FRK26', 'FRK27', 'FRK28', 'FRL', 'FRL0', 'FRL01', 'FRL02', 'FRL03', 'FRL04', 'FRL05', 'FRL06', 'FRM', 'FRM0', 'FRM01', 'FRM02', 'FRY', 'FRY1', 'FRY10', 'FRY2', 'FRY20', 'FRY3', 'FRY30', 'FRY4', 'FRY40', 'FRY5', 'FRY50', 'HR', 'HU', 'HU1', 'HU11', 'HU110', 'HU12', 'HU120', 'HU2', 'HU21', 'HU211', 'HU212', 'HU213', 'HU22', 'HU221', 'HU222', 'HU223', 'HU23', 'HU231', 'HU232', 'HU233', 'HU3', 'HU31', 'HU311', 'HU312', 'HU313', 'HU32', 'HU321', 'HU322', 'HU323', 'HU33', 'HU331', 'HU332', 'HU333', 'HUX', 'HUXX', 'HUXXX', 'IS', 'IS0', 'IS00', 'IS001', 'IS002', 'IT', 'ITC', 'ITC1', 'ITC11', 'ITC12', 'ITC13', 'ITC14', 'ITC15', 'ITC16', 'ITC17', 'ITC18', 'ITC2', 'ITC20', 'ITC3', 'ITC31', 'ITC32', 'ITC33', 'ITC34', 'ITC4', 'ITC41', 'ITC42', 'ITC43', 'ITC44', 'ITC46', 'ITC47', 'ITC48', 'ITC49', 'ITC4A', 'ITC4B', 'ITC4C', 'ITC4D', 'ITF', 'ITF1', 'ITF11', 'ITF12', 'ITF13', 'ITF14', 'ITF2', 'ITF21', 'ITF22', 'ITF3', 'ITF31', 'ITF32', 'ITF33', 'ITF34', 'ITF35', 'ITF4', 'ITF43', 'ITF44', 'ITF45', 'ITF46', 'ITF47', 'ITF48', 'ITF5', 'ITF51', 'ITF52', 'ITF6', 'ITF61', 'ITF62', 'ITF63', 'ITF64', 'ITF65', 'ITG', 'ITG1', 'ITG11', 'ITG12', 'ITG13', 'ITG14', 'ITG15', 'ITG16', 'ITG17', 'ITG18', 'ITG19', 'ITG2', 'ITG25', 'ITG26', 'ITG27', 'ITG28', 'ITG29', 'ITG2A', 'ITG2B', 'ITG2C', 'ITH', 'ITH1', 'ITH10', 'ITH2', 'ITH20', 'ITH3', 'ITH31', 'ITH32', 'ITH33', 'ITH34', 'ITH35', 'ITH36', 'ITH37', 'ITH4', 'ITH41', 'ITH42', 'ITH43', 'ITH44', 'ITH5', 'ITH51', 'ITH52', 'ITH53', 'ITH54', 'ITH55', 'ITH56', 'ITH57', 'ITH58', 'ITH59', 'ITI', 'ITI1', 'ITI11', 'ITI12', 'ITI13', 'ITI14', 'ITI15', 'ITI16', 'ITI17', 'ITI18', 'ITI19', 'ITI1A', 'ITI2', 'ITI21', 'ITI22', 'ITI3', 'ITI31', 'ITI32', 'ITI33', 'ITI34', 'ITI35', 'ITI4', 'ITI41', 'ITI42', 'ITI43', 'ITI44', 'ITI45', 'LI', 'LI0', 'LI00', 'LI000', 'LT', 'LT0', 'LT01', 'LT011', 'LT02', 'LT021', 'LT022', 'LT023', 'LT024', 'LT025', 'LT026', 'LT027', 'LT028', 'LT029', 'LU', 'LU0', 'LU00', 'LU000', 'LV', 'LV0', 'LV00', 'LV003', 'LV005', 'LV006', 'LV007', 'LV008', 'LV009', 'LVX', 'LVXX', 'LVXXX', 'ME', 'ME0', 'ME00', 'ME000', 'NL', 'NL1', 'NL11', 'NL111', 'NL112', 'NL113', 'NL12', 'NL124', 'NL125', 'NL126', 'NL13', 'NL131', 'NL132', 'NL133', 'NL2', 'NL21', 'NL211', 'NL212', 'NL213', 'NL22', 'NL221', 'NL224', 'NL225', 'NL226', 'NL23', 'NL230', 'NL3', 'NL31', 'NL310', 'NL32', 'NL321', 'NL323', 'NL324', 'NL325', 'NL327', 'NL328', 'NL329', 'NL33', 'NL332', 'NL333', 'NL337', 'NL33A', 'NL33B', 'NL33C', 'NL34', 'NL341', 'NL342', 'NL4', 'NL41', 'NL411', 'NL412', 'NL413', 'NL414', 'NL42', 'NL421', 'NL422', 'NL423', 'NO', 'NO0', 'NO01', 'NO011', 'NO012', 'NO02', 'NO021', 'NO022', 'NO03', 'NO031', 'NO032', 'NO033', 'NO034', 'NO04', 'NO041', 'NO042', 'NO043', 'NO05', 'NO051', 'NO052', 'NO053', 'NO06', 'NO060', 'NO061', 'NO062', 'NO07', 'NO071', 'NO072', 'NO073', 'PT', 'PT1', 'PT11', 'PT111', 'PT112', 'PT119', 'PT11A', 'PT11B', 'PT11C', 'PT11D', 'PT11E', 'PT15', 'PT150', 'PT16', 'PT16B', 'PT16D', 'PT16E', 'PT16F', 'PT16G', 'PT16H', 'PT16I', 'PT16J', 'PT17', 'PT170', 'PT18', 'PT181', 'PT184', 'PT185', 'PT186', 'PT187', 'PT2', 'PT20', 'PT200', 'PT3', 'PT30', 'PT300', 'PTX', 'PTXX', 'PTXXX', 'RS', 'RS1', 'RS11', 'RS110', 'RS12', 'RS121', 'RS122', 'RS123', 'RS124', 'RS125', 'RS126', 'RS127', 'RS2', 'RS21', 'RS211', 'RS212', 'RS213', 'RS214', 'RS215', 'RS216', 'RS217', 'RS218', 'RS22', 'RS221', 'RS222', 'RS223', 'RS224', 'RS225', 'RS226', 'RS227', 'RS228', 'RS229', 'SE', 'SE1', 'SE11', 'SE110', 'SE12', 'SE121', 'SE122', 'SE123', 'SE124', 'SE125', 'SE2', 'SE21', 'SE211', 'SE212', 'SE213', 'SE214', 'SE22', 'SE221', 'SE224', 'SE23', 'SE231', 'SE232', 'SE3', 'SE31', 'SE311', 'SE312', 'SE313', 'SE32', 'SE321', 'SE322', 'SE33', 'SE331', 'SE332', 'SI', 'SK', 'SK0', 'SK01', 'SK010', 'SK02', 'SK021', 'SK022', 'SK023', 'SK03', 'SK031', 'SK032', 'SK04', 'SK041', 'SK042', 'UK', 'UKC', 'UKC1', 'UKC11', 'UKC12', 'UKC13', 'UKC14', 'UKC2', 'UKC21', 'UKC22', 'UKC23', 'UKD', 'UKD1', 'UKD11', 'UKD12', 'UKD3', 'UKD33', 'UKD34', 'UKD35', 'UKD36', 'UKD37', 'UKD4', 'UKD41', 'UKD42', 'UKD44', 'UKD45', 'UKD46', 'UKD47', 'UKD6', 'UKD61', 'UKD62', 'UKD63', 'UKD7', 'UKD71', 'UKD72', 'UKD73', 'UKD74', 'UKE', 'UKE1', 'UKE11', 'UKE12', 'UKE13', 'UKE2', 'UKE21', 'UKE22', 'UKE3', 'UKE31', 'UKE32', 'UKE4', 'UKE41', 'UKE42', 'UKE44', 'UKE45', 'UKF', 'UKF1', 'UKF11', 'UKF12', 'UKF13', 'UKF14', 'UKF15', 'UKF16', 'UKF2', 'UKF21', 'UKF22', 'UKF24', 'UKF25', 'UKF3', 'UKF30', 'UKG', 'UKG1', 'UKG11', 'UKG12', 'UKG13', 'UKG2', 'UKG21', 'UKG22', 'UKG23', 'UKG24', 'UKG3', 'UKG31', 'UKG32', 'UKG33', 'UKG36', 'UKG37', 'UKG38', 'UKG39', 'UKH', 'UKH1', 'UKH11', 'UKH12', 'UKH14', 'UKH15', 'UKH16', 'UKH17', 'UKH2', 'UKH21', 'UKH23', 'UKH24', 'UKH25', 'UKH3', 'UKH31', 'UKH32', 'UKH34', 'UKH35', 'UKH36', 'UKH37', 'UKI', 'UKI3', 'UKI31', 'UKI32', 'UKI33', 'UKI34', 'UKI4', 'UKI41', 'UKI42', 'UKI43', 'UKI44', 'UKI45', 'UKI5', 'UKI51', 'UKI52', 'UKI53', 'UKI54', 'UKI6', 'UKI61', 'UKI62', 'UKI63', 'UKI7', 'UKI71', 'UKI72', 'UKI73', 'UKI74', 'UKI75', 'UKJ', 'UKJ1', 'UKJ11', 'UKJ12', 'UKJ13', 'UKJ14', 'UKJ2', 'UKJ21', 'UKJ22', 'UKJ25', 'UKJ26', 'UKJ27', 'UKJ28', 'UKJ3', 'UKJ31', 'UKJ32', 'UKJ34', 'UKJ35', 'UKJ36', 'UKJ37', 'UKJ4', 'UKJ41', 'UKJ43', 'UKJ44', 'UKJ45', 'UKJ46', 'UKK', 'UKK1', 'UKK11', 'UKK12', 'UKK13', 'UKK14', 'UKK15', 'UKK2', 'UKK21', 'UKK22', 'UKK23', 'UKK3', 'UKK30', 'UKK4', 'UKK41', 'UKK42', 'UKK43', 'UKL', 'UKL1', 'UKL11', 'UKL12', 'UKL13', 'UKL14', 'UKL15', 'UKL16', 'UKL17', 'UKL18', 'UKL2', 'UKL21', 'UKL22', 'UKL23', 'UKL24', 'UKM', 'UKN', 'IE', 'AL', 'CY', 'EL', 'TR', 'PL', 'RO', 'MK', 'MT']

Temporal series

Likewise the Statistics Explained article on death statistics, we actually the death excess rate on a weekly basis. The reference shall still be taken as the mean over the previous years starting from 201. For reminder:

In [34]:
BASE = "mean" # "max" # "min"
INC = 'rinc'
AGE, SEX = "TOTAL", "T"
YSTART, YCUR = 2016, 2020 

then we use the previously defined functions to generate the desired datasets (data), once per single week (WEEKRES) and per single NUTS level (LEVELS):

In [35]:
data = dict.fromkeys(WEEKRES)

for w in WEEKRES:
    temp = estimate_excess_rate(death_data, inc = INC, agg = BASE, 
                                week = w, ystart = YSTART, year = YCUR)
    temp = build_table(temp, inc = INC, key=KEY,
                       age = AGE, sex = SEX, 
                       miss_id = MISSING_CTRY_ID)
    temp = build_unit_level(nuts_data, temp, 
                            levels = LEVELS, key = KEY, how='right') 
    data[w] = propagate_unit_level(temp, levels = LEVELS, 
                                   inc = INC, key = KEY, 
                                   miss_id = MISSING_CTRY_ID)

We display the different estimations together:

In [36]:
LEVEL = 3

ncols, height = 2, 45
nrows = int(np.ceil(len(WEEKRES) / ncols))
f, ax = mplt.subplots(nrows, ncols, constrained_layout=True,
                      figsize=(int(height*ncols/nrows), height))
nax = nrows * ncols

for i, w in enumerate(WEEKRES):
    x, y = int(i/ncols), i%ncols
    crop = gpd.clip( data[w][LEVEL], EUclip)
    crop.plot(column='rinc', ax=ax[x][y], cmap='OrRd', legend=False,
              missing_kwds={ "color": "lightgrey", "alpha": 0.02, "edgecolor": "grey", "hatch": "///"}
             )
    day = '%s-W%s' % (YCUR, w)
    mon, sun = [datetime.datetime.strptime(day + '-%s' % d, '%G-W%V-%u').strftime("%d %b") for d in [1,7]]
    ax[x][y].set_axis_off()
    ax[x][y].set_title('Deaths in week %s - %s %s' % (mon, sun, YCUR)) 

for j in range(i+1, nax):
    x, y = int(j/ncols), j%ncols
    ax[x][y].set_axis_off()
    ax[x][y].axes.get_xaxis().set_visible(False); ax[x][y].axes.get_yaxis().set_visible(False)

y_title_pos = ax[0][0].get_position().get_points()[1][1]+(1/nrows)*0.15
f.suptitle('Excess mortality (in %%) during weeks %s to %s, %s (age: %s, sex: %s) - NUTS level %s' 
             % (min(WEEKRES), max(WEEKRES), YCUR, AGE, SEX, LEVEL), y=y_title_pos, fontsize='xx-large')
mplt.show()

Let's represent these data "all together".

First, we anticipate that we will reproject the data to lat/lon reference system, so we do it once for all:

In [37]:
for w in WEEKRES:
    # for level in LEVELS:
    data[w][LEVEL] = data[w][LEVEL].to_crs('EPSG:4326')

We first build a colormap based on the actual values and whose color values follow the color code adopted in the Statistics Explained article:

In [38]:
colors = ((238,208,183), (227,168,127), (219,131,83), (211,98,42), (182,0,0))
colors_index = [nuts_death_data[LEVEL][INC].min(), 100, 200, 300, 400]
vmin = min([int(np.floor(data[w][LEVEL][INC].min())) for w in WEEKRES])
vmax = max([int(np.floor(data[w][LEVEL][INC].max())) for w in WEEKRES])

cmap_nuts = bcm.StepColormap(colors=colors,index=colors_index,
                            vmin=vmin, 
                            vmax=vmax,
                            caption='death excess rate'
                           )
cmap_nuts
Out[38]:
-100900

Following, we can represent the choropleths (with this common colormap) for the different weeks:

In [39]:
m = folium.Map((52.3,8.0), zoom_start=4)

def style_function(f_rinc, feature):
    _id = f_rinc[feature['properties']['NUTS_ID']]
    return {
        'color':'black',
        'fillColor': '#gray',
        'fillOpacity': 0.05,
        'weight': 0.1,
    } if (_id is None or np.isnan(_id)) else {
        'color':'black',
        'fillColor': cmap_nuts(_id),
        'fillOpacity': 0.9,
        'weight': 0.3,
    } 

def highlight_function(feature):
    return {'fillColor': '#000000', 
            'color':'#000000',
            'fillOpacity': 0.6, 
            'weight': 0.1}

def tooltip():
    return folium.GeoJsonTooltip(fields=['NUTS_ID',INC],
            aliases=['NUTS','% death excess'],
            style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
            sticky=True
        )

for i, w in enumerate(WEEKRES):
    day = '%s-W%s' % (YCUR, w)
    mon, sun = [datetime.datetime.strptime(day + '-%s' % d, '%G-W%V-%u').strftime("%d %b") for d in [1,7]]
    folium.GeoJson(
        data = data[w][LEVEL].to_json(),
        name = '%s - %s (#%s)' % (mon, sun, w),
        style_function = functools.partial(style_function, data[w][LEVEL].set_index(KEY)[INC]),
        highlight_function = highlight_function,
        tooltip = tooltip(),
        show = True if i==0 else False
    ).add_to(m)

folium.LayerControl('topleft', collapsed=False).add_to(m)
cmap_nuts.add_to(m)

m
Out[39]:

Using instead the Choropleth class of folium:

In [40]:
LEVEL = 3
W = 9

m = folium.Map((52.3,8.0), zoom_start=4)

choro = folium.Choropleth(
    geo_data = data[W][LEVEL].to_json(),
    data = data[W][LEVEL],
    name = 'Week #%s' % W,
    columns=['NUTS_ID', INC],
    fill_color='OrRd',
    fill_opacity = 0.9, 
    line_opacity = 0.9,
    nan_fill_color='grey',
    nan_fill_opacity=0.05,
    nan_line_opacity = 0.05,
    key_on='feature.properties.NUTS_ID',
    show=True
).add_to(m)

# choro.geojson.add_child( tooltip() )

folium.LayerControl(collapsed=False).add_to(m)
m
Out[40]:

Say that the packages ipyleaflet is available, you can also run:

In [41]:
import ipyleaflet

LEVEL = 3

geo_data = json.loads(nuts_death_data[3]
                      .dropna(axis='index', subset=[INC]) 
                      .to_crs('EPSG:4326')
                      .to_json()
                     )
choro_data = dict(zip(nuts_death_data[LEVEL].index.astype(str).tolist(), 
                      nuts_death_data[LEVEL][INC]))

m = ipyleaflet.Map(center = (52.3,8.0), zoom = 4)

layer = ipyleaflet.Choropleth(
    geo_data = geo_data,
    choro_data = choro_data,
    colormap = cmap_nuts,
    on_key = 'id',
    style = {'fillOpacity': 1, 'dashArray': '5, 5'},
    name = 'Excess mortality (in %%) during weeks %s to %s, %s (age: %s, sex: %s) - NUTS level %s' 
             % (min(WEEKRES), max(WEEKRES), YCUR, AGE, SEX, LEVEL)
)

m.add_layer(layer)

m
In [ ]: